Unifying Adversarial Training Algorithms with Data Gradient Regularization
نویسندگان
چکیده
Many previous proposals for adversarial training of deep neural nets have included directly modifying the gradient, training on a mix of original and adversarial examples, using contractive penalties, and approximately optimizing constrained adversarial objective functions. In this article, we show that these proposals are actually all instances of optimizing a general, regularized objective we call DataGrad. Our proposed DataGrad framework, which can be viewed as a deep extension of the layerwise contractive autoencoder penalty, cleanly simplifies prior work and easily allows extensions such as adversarial training with multitask cues. In our experiments, we find that the deep gradient regularization of DataGrad (which also has L1 and L2 flavors of regularization) outperforms alternative forms of regularization, including classical L1, L2, and multitask, on both the original data set and adversarial sets. Furthermore, we find that combining multitask optimization with DataGrad adversarial training results in the most robust performance.
منابع مشابه
Unifying Adversarial Training Algorithms with Flexible Deep Data Gradient Regularization
We present DataGrad, a general back-propagation style training procedure for deep neural architectures that uses regularization of a deep Jacobian-based penalty. It can be viewed as a deep extension of the layerwise contractive auto-encoder penalty. More importantly, it unifies previous proposals for adversarial training of deep neural nets – this list includes directly modifying the gradient, ...
متن کاملVirtual Adversarial Training: a Regularization Method for Supervised and Semi-supervised Learning
We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the output distribution. Virtual adversarial loss is defined as the robustness of the model’s posterior distribution against local perturbation around each input data point. Our method is similar to adversarial training, but differs from adversarial training in that it determines the a...
متن کاملWasserstein Distributional Robustness and Regularization in Statistical Learning
A central question in statistical learning is to design algorithms that not only perform well on training data, but also generalize to new and unseen data. In this paper, we tackle this question by formulating a distributionally robust stochastic optimization (DRSO) problem, which seeks a solution that minimizes the worstcase expected loss over a family of distributions that are close to the em...
متن کاملGradient Regularization Improves Accuracy of Discriminative Models
Regularizing the gradient norm of the output of a neural network with respect to its inputs is a powerful technique, first proposed by Drucker & LeCun (1991), who named it Double Backpropagation. The idea was independently rediscovered several times since then, most often with the goal of making models robust against adversarial sampling. This paper presents evidence that gradient regularizatio...
متن کاملOn the regularization of Wasserstein GANs
Since their invention, generative adversarial networks (GANs) have become a popular approach for learning to model a distribution of real (unlabeled) data. Convergence problems during training are overcome by Wasserstein GANs which minimize the distance between the model and the empirical distribution in terms of a different metric, but thereby introduce a Lipschitz constraint into the optimiza...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neural computation
دوره 29 4 شماره
صفحات -
تاریخ انتشار 2017